GT200: Nvidia GeForce GTX 280 analysis

Written by Tim Smalley

June 24, 2008 | 10:15

Tags: #280 #analysis #architecture #evaluation #geforce #gt200 #gtx #performance #review #theoretical

Companies: #nvidia #test

Shader Core

Diving down a little deeper, the 240 shader processors are split into ten texture (or thread, in compute mode) processing clusters.

These are then further split into three groups of eight shader processors along with combine an instruction unit and 16KB of shared local memory – they’re known as streaming multiprocessors (SMs) as we discussed earlier on.

This is where most of the 3D graphics (and just about all of the general compute) performance comes from; obviously, there are other factors and limitations that determine performance in real applications, but you can think of it as the engine room.

On top of this, each texture processing cluster also has eight texture units that can handle eight texture addresses and eight bilinear texture filters per clock. These run at 602MHz in the GeForce GTX 280 and deliver a combined texture fill-rate of 48 gigatexels per second.

GT200: Nvidia GeForce GTX 280 analysis GT200 shader core and performanceInside each streaming multiprocessor, there is also a multi-banked register file, which has been doubled in size compared to the GeForce 8- and 9-series GPUs. The reason for this, Nvidia says, is that there are situations—both in 3D and general compute operations—where the GPU often ran out of register space, meaning it had to swap data out to memory – this is not ideal because, while the SM is waiting for data to come back from memory, it can stall and that can cause some serious performance implications.

The instruction unit manages groups of 32 parallel threads – Nvidia calls these 'warps' and each instruction unit can handle 32 warps, making a total of 1,024 threads in flight per streaming multiprocessor. As there are 30 SMs inside GT200, this means the chip can handle up to 30,720 threads at any given time – this is up from G80's peak throughput of 768 threads per SM and 12,288 for the whole chip.

A GT200 Texture Processing Cluster

A GT200 Texture Processing Cluster

Warps are an important aspect of the way Nvidia's unified architecture functions – operations are handled in blocks of 32 threads and so one warp is the granularity of the chip's branching capabilities.

If a particular thread is waiting for a high latency task (such as a texture read or memory access) to complete, the multithreaded instruction unit can switch to another warp at no cost – this helps to hide latency and prevent the stream processors from sitting idle or, even worse, stalling.

One of the standout features on the long list of architectural improvements made in GT200 was the significant size increase Nvidia has applied to the stream out buffers. Nvidia documentation claims that the internal output buffer structures have been upsized by a factor of six, which should help to make the geometry shader more accessible to developers who want to use it to generate lots of data in tasks like tessellation.

How fast is the new shader core though? Well, there's only one way to find out...


D3D10 - 3DMark Vantage: POM Shader Test

Parallax Occlusion Mapping Shader Test, Extreme Settings

  • Nvidia GeForce GTX 280 1GB
  • Nvidia GeForce 9800 GX2 1GB
  • ATI Radeon HD 3870 X2 1GB
  • Nvidia GeForce 9800 GTX 512MB
  • Nvidia GeForce 8800 Ultra 768MB
    • 30.6
    • 23.4
    • 16.2
    • 11.8
    • 11.6
0
5
10
15
20
25
30
Frames Per Second
  • Frame rate

D3D10 - RightMark 3D 2.0: Steep Parallax Mapping

1920x1200 0xAA 0xAF, High Effect Detail

  • Nvidia GeForce GTX 280 1GB
  • Nvidia GeForce 9800 GTX 512MB
  • Nvidia GeForce 8800 Ultra 768MB
  • ATI Radeon HD 3870 X2 1GB
    • 278.8
    • 122.5
    • 120.4
    • 75.6
0
50
100
150
200
250
300
Frames Per Second
  • Frame rate

D3D10 - 3DMark Vantage: Perlin Noise

Perlin Noise Test, Extreme Settings

  • Nvidia GeForce GTX 280 1GB
  • Nvidia GeForce 9800 GX2 1GB
  • ATI Radeon HD 3870 X2 1GB
  • Nvidia GeForce 8800 Ultra 768MB
  • Nvidia GeForce 9800 GTX 512MB
    • 55.3
    • 47.7
    • 45.1
    • 25.1
    • 24.1
0
10
20
30
40
50
60
Frames Per Second
  • Frame rate

D3D9.0c - 3DMark06: Perlin Noise

Perlin Noise Test, Default Settings

  • Nvidia GeForce 9800 GX2 1GB
  • ATI Radeon HD 3870 X2 1GB
  • Nvidia GeForce GTX 280 1GB
  • Nvidia GeForce 9800 GTX 512MB
  • Nvidia GeForce 8800 Ultra 768MB
    • 378.3
    • 349.7
    • 315.4
    • 194.9
    • 178.5
0
100
200
300
400
Frames Per Second
  • Frame rate

D3D10 - 3DMark Vantage: GPU Particles

Perlin Noise Test, Default Settings

  • Nvidia GeForce GTX 280 1GB
  • Nvidia GeForce 8800 Ultra 768MB
  • Nvidia GeForce 9800 GX2 1GB
  • Nvidia GeForce 9800 GTX 512MB
  • ATI Radeon HD 3870 X2 1GB
    • 36.5
    • 28.8
    • 28.1
    • 25.7
    • 17.4
0
10
20
30
40
Frames Per Second
  • Frame rate

Aside from the strange result for the Radeon HD 3870 X2 in RightMark 3D 2.0's Parallax Mapping test (which is probably down to the Radeon HD 3870 X2 being slightly texture limited in some respects), things add up - the GeForce GTX 280 is a brute when it comes to general pixel shader tasks. Of course, much of this is down to the increased shader count, but some of the performance increase is attributed to the improved register file size.

Interestingly though, both the GeForce 9800 GX2 and Radeon HD 3870 X2 are faster in 3DMark06's Perlin Noise test. This is probably down to the fact that the shaders aren't quite as complex as they are in 3DMark Vantage's updated version of the test and, as a result, the register files aren't quite so full which negates any potential benefits from the beefed up units.
Discuss this in the forums
YouTube logo
MSI MPG Velox 100R Chassis Review

October 14 2021 | 15:04